[SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation #45234

erenavsarogullari · 2024-02-23T18:26:04Z

What changes were proposed in this pull request?

AQE can materialize both ShuffleQueryStage and BroadcastQueryStage on the cancellation. This causes unnecessary stage materialization by submitting Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is already non-materialized (a.k.a ShuffleQueryStage.shuffleFuture or BroadcastQueryStage.broadcastFuture is not initialized yet), it should just be skipped without materializing it.

Problematic Stacktrace:

at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.submitShuffleJob(ShuffleExchangeExec.scala:104)
at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture$lzycompute(QueryStageExec.scala:210)
at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture(QueryStageExec.scala:210)
at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.cancel(QueryStageExec.scala:223)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$cleanUpAndThrowException$1(AdaptiveSparkPlanExec.scala:905)

Please find sample use-case:
1- Stage Materialization Steps:
When stage materialization is failed:

1.1- ShuffleQueryStage1 - is materialized successfully,
1.2- ShuffleQueryStage2 - materialization is failed,
1.3- ShuffleQueryStage3 - Not materialized yet so ShuffleQueryStage3.shuffleFuture is not initialized yet

2- Stage Cancellation Steps:

2.1- ShuffleQueryStage1 - is canceled due to already materialized,
2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as default by AQE because it could not be materialized,
2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet but currently, it is also tried to cancel and this stage requires to be materialized first.

Reproduce Steps:
https://github.com/apache/spark/pull/45234/files#diff-f89f2fe78b324c6bc7190bef84220181f3616efc156ea99b3f15d375a22d7f88R900

Why are the changes needed?

Current logic introduces unnecessary Shuffle Job / Broadcast Job to be able to cancel ShuffleQueryStage / BroadcastQueryStage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added new Unit Tests

Was this patch authored or co-authored using generative AI tooling?

No

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

erenavsarogullari · 2024-02-29T21:32:40Z

Should we also backport this patch to v3.4.x and v3.5.x?

ulysses-you · 2024-03-05T09:03:39Z

cc @cloud-fan @maryannxue as well

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala

erenavsarogullari · 2024-04-22T04:38:39Z

Thanks @cloud-fan and @ulysses-you for the reviews and approval.
I have just rebased to get green build.

erenavsarogullari · 2024-04-22T16:09:57Z

Build is green now so PR is ready to be merged. Thanks in advance.

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

…cancellation

cloud-fan · 2024-04-29T14:11:20Z

thanks, merging to master!

… the cancellation ### What changes were proposed in this pull request? AQE can materialize both `ShuffleQueryStage` and `BroadcastQueryStage` on the cancellation. This causes unnecessary stage materialization by submitting Shuffle Job and Broadcast Job. Under normal circumstances, if the stage is already non-materialized (a.k.a `ShuffleQueryStage.shuffleFuture` or `BroadcastQueryStage.broadcastFuture` is not initialized yet), it should just be skipped without materializing it. **Problematic Stacktrace:** ``` at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.submitShuffleJob(ShuffleExchangeExec.scala:104) at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture$lzycompute(QueryStageExec.scala:210) at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.shuffleFuture(QueryStageExec.scala:210) at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.cancel(QueryStageExec.scala:223) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$cleanUpAndThrowException$1(AdaptiveSparkPlanExec.scala:905) ``` Please find sample use-case: **1- Stage Materialization Steps:** When stage materialization is failed: ``` 1.1- ShuffleQueryStage1 - is materialized successfully, 1.2- ShuffleQueryStage2 - materialization is failed, 1.3- ShuffleQueryStage3 - Not materialized yet so ShuffleQueryStage3.shuffleFuture is not initialized yet ``` **2- Stage Cancellation Steps:** ``` 2.1- ShuffleQueryStage1 - is canceled due to already materialized, 2.2- ShuffleQueryStage2 - is earlyFailedStage so currently, it is skipped as default by AQE because it could not be materialized, 2.3- ShuffleQueryStage3 - Problem is here: This stage is not materialized yet but currently, it is also tried to cancel and this stage requires to be materialized first. ``` **Reproduce Steps:** https://github.com/apache/spark/pull/45234/files#diff-f89f2fe78b324c6bc7190bef84220181f3616efc156ea99b3f15d375a22d7f88R900 ### Why are the changes needed? Current logic introduces unnecessary Shuffle Job / Broadcast Job to be able to cancel `ShuffleQueryStage` / `BroadcastQueryStage`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added new Unit Tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45234 from erenavsarogullari/SPARK-47148. Authored-by: erenavsarogullari <erenavsarogullari@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? A followup of #45234 to make the test more stable by using broadcast hint. ### Why are the changes needed? test improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #47007 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

cloud-fan · 2024-07-19T09:11:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala

+   * such as waiting for the subqueries.
+   */
+  @transient private lazy val shuffleFuture: Future[MapOutputStatistics] = executeQuery {
+    materializationStarted.set(true)


After a closer look, I don't think this change works as we expect. We set this materializationStarted flag before we return the Future, which means we are still on the AQE loop's main thread. That said, once we submit a query stage, its materializationStarted becomes true immediately and we can't really avoid the wasted query stage execution when cancelling it.

The test passed because ShuffleExchangeExec calls child.execute() before returning the Future. Then we exit the AQE loop without cancelling other stages.

Further more, I don't think this idea works. Let's say when we want to cancel a query stage and the materializationStarted flag is false, we decide to skip the cancellation but maybe the next second the materializationStarted becomes true and we miss to cancel the shuffle job.

I think we need a bit of synchronization here. The shuffle node should have two fields: isCancelled flag and the shuffle job Future.

When we cancel a shuffle, we lock on the shuffle node, and set isCancelled flag to true. Then if the shuffle job Future is present, we cancel it.

When we are going to submit a shuffle, we lock on the shuffle node. Then: if the isCancelled flag is true, fail immediately, otherwise, submit the shuffle job and set the Future field.

cc @erenavsarogullari @ulysses-you

It seems not an acutally issue for now. AQE always do materilize stage and cancel stage at main thread. So if we decide to cancel stage then that means we will never do materilize stage again. We may need improve this code if we support do materilize concurrently in future.

…ages ### What changes were proposed in this pull request? We missed the fact that submitting a shuffle or broadcast query stage can be heavy, as it needs to submit subqueries and wait for the results. This blocks the AQE loop and hurts the parallelism of AQE. This PR fixes the problem by using shuffle/broadcast's own thread pool to wait for subqueries and other preparations. This PR also re-implements #45234 to avoid submitting the shuffle job if the query is failed and all query stages need to be cancelled. ### Why are the changes needed? better parallelism for AQE ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test case ### Was this patch authored or co-authored using generative AI tooling? no Closes #47533 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…ages ### What changes were proposed in this pull request? We missed the fact that submitting a shuffle or broadcast query stage can be heavy, as it needs to submit subqueries and wait for the results. This blocks the AQE loop and hurts the parallelism of AQE. This PR fixes the problem by using shuffle/broadcast's own thread pool to wait for subqueries and other preparations. This PR also re-implements apache#45234 to avoid submitting the shuffle job if the query is failed and all query stages need to be cancelled. ### Why are the changes needed? better parallelism for AQE ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test case ### Was this patch authored or co-authored using generative AI tooling? no Closes apache#47533 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Feb 23, 2024

LuciferYang requested a review from ulysses-you February 27, 2024 12:26

ulysses-you reviewed Feb 28, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

ulysses-you reviewed Feb 28, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

erenavsarogullari changed the title ~~[SPARK-47148][SQL] Avoid to materialize AQE ShuffleQueryStage on the cancellation~~ [SPARK-47148][SQL] Avoid to materialize AQE QueryStages on the cancellation Feb 29, 2024

erenavsarogullari changed the title ~~[SPARK-47148][SQL] Avoid to materialize AQE QueryStages on the cancellation~~ [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation Feb 29, 2024

erenavsarogullari force-pushed the SPARK-47148 branch 2 times, most recently from 53dd089 to a7a869f Compare March 1, 2024 23:35

cloud-fan reviewed Mar 5, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

erenavsarogullari force-pushed the SPARK-47148 branch from a7a869f to ef8c50e Compare March 6, 2024 05:51

cloud-fan reviewed Mar 6, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 6, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

erenavsarogullari force-pushed the SPARK-47148 branch 2 times, most recently from 355aeb0 to a923c2a Compare March 7, 2024 03:23

cloud-fan reviewed Mar 8, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 11, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 11, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

erenavsarogullari force-pushed the SPARK-47148 branch 2 times, most recently from 867134b to edf95b3 Compare March 18, 2024 23:40

erenavsarogullari force-pushed the SPARK-47148 branch 3 times, most recently from c178f2e to d0e4127 Compare March 30, 2024 22:54

cloud-fan reviewed Apr 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala Outdated Show resolved Hide resolved

erenavsarogullari force-pushed the SPARK-47148 branch 2 times, most recently from aaafa1d to 4378034 Compare April 20, 2024 01:02

cloud-fan approved these changes Apr 22, 2024

View reviewed changes

erenavsarogullari force-pushed the SPARK-47148 branch from 4378034 to c17240b Compare April 22, 2024 04:34

cloud-fan reviewed Apr 23, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 23, 2024

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala Outdated Show resolved Hide resolved

erenavsarogullari added 2 commits April 28, 2024 18:08

SPARK-47148 - Avoid to materialize AQE ExchangeQueryStageExec on the …

7950208

…cancellation

SPARK-47148 - Refactor AQE APIs

1105282

erenavsarogullari force-pushed the SPARK-47148 branch from ae77cbb to 1105282 Compare April 29, 2024 01:09

cloud-fan approved these changes Apr 29, 2024

View reviewed changes

cloud-fan closed this in d913d1b Apr 29, 2024

cloud-fan mentioned this pull request Jun 18, 2024

[SPARK-47148][SQL][FOLLOWUP] Use broadcast hint to make test more stable #47007

Closed

cloud-fan reviewed Jul 19, 2024

View reviewed changes

cloud-fan mentioned this pull request Jul 30, 2024

[SPARK-49057][SQL] Do not block the AQE loop when submitting query stages #47533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation #45234

[SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation #45234

erenavsarogullari commented Feb 23, 2024 •

edited

Loading

erenavsarogullari commented Feb 29, 2024

ulysses-you commented Mar 5, 2024

erenavsarogullari commented Apr 22, 2024

erenavsarogullari commented Apr 22, 2024

cloud-fan commented Apr 29, 2024

cloud-fan Jul 19, 2024 •

edited

Loading

cloud-fan Jul 19, 2024

cloud-fan Jul 19, 2024

cloud-fan Jul 19, 2024

ulysses-you Jul 23, 2024

[SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation #45234

[SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation #45234

Conversation

erenavsarogullari commented Feb 23, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

erenavsarogullari commented Feb 29, 2024

ulysses-you commented Mar 5, 2024

erenavsarogullari commented Apr 22, 2024

erenavsarogullari commented Apr 22, 2024

cloud-fan commented Apr 29, 2024

cloud-fan Jul 19, 2024 • edited Loading

Choose a reason for hiding this comment

cloud-fan Jul 19, 2024

Choose a reason for hiding this comment

cloud-fan Jul 19, 2024

Choose a reason for hiding this comment

cloud-fan Jul 19, 2024

Choose a reason for hiding this comment

ulysses-you Jul 23, 2024

Choose a reason for hiding this comment

erenavsarogullari commented Feb 23, 2024 •

edited

Loading

cloud-fan Jul 19, 2024 •

edited

Loading